Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays

نویسندگان

Junpei Komiyama

Junya Honda

Hiroshi Nakagawa

چکیده

We discuss a multiple-play multi-armed bandit (MAB) problem in which several arms are selected at each round. Recently, Thompson sampling (TS), a randomized algorithm with a Bayesian spirit, has attracted much attention for its empirically excellent performance, and it is revealed to have an optimal regret bound in the standard single-play MAB problem. In this paper, we propose the multiple-play Thompson sampling (MP-TS) algorithm, an extension of TS to the multiple-play MAB problem, and discuss its regret analysis. We prove that MP-TS for binary rewards has the optimal regret upper bound that matches the regret lower bound provided by Anantharam et al. (1987). Therefore, MPTS is the first computationally efficient algorithm with optimal regret. A set of computer simulations was also conducted, which compared MPTS with state-of-the-art algorithms. We also propose a modification of MP-TS, which is shown to have better empirical performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Analysis of Thompson Sampling for the Multi-armed Bandit Problem

The multi-armed bandit problem is a popular model for studying exploration/exploitation trade-off in sequential decision problems. Many algorithms are now available for this well-studied problem. One of the earliest algorithms, given by W. R. Thompson, dates back to 1933. This algorithm, referred to as Thompson Sampling, is a natural Bayesian algorithm. The basic idea is to choose an arm to pla...

متن کامل

Stochastic Regret Minimization via Thompson Sampling

The Thompson Sampling (TS) policy is a widely implemented algorithm for the stochastic multiarmed bandit (MAB) problem. Given a prior distribution over possible parameter settings of the underlying reward distributions of the arms, at each time instant, the policy plays an arm with probability equal to the probability that this arm has largest mean reward conditioned on the current posterior di...

متن کامل

Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis

The question of the optimality of Thompson Sampling for solving the stochastic multi-armed bandit problem had been open since 1933. In this paper we answer it positively for the case of Bernoulli rewards by providing the first finite-time analysis that matches the asymptotic rate given in the Lai and Robbins lower bound for the cumulative regret. The proof is accompanied by a numerical comparis...

متن کامل

Analysis of Thompson Sampling for Stochastic Sleeping Bandits

We study a variant of the stochastic multiarmed bandit problem where the set of available arms varies arbitrarily with time (also known as the sleeping bandit problem). We focus on the Thompson Sampling algorithm and consider a regret notion defined with respect to the best available arm. Our main result is anO(log T ) regret bound for Thompson Sampling, which generalizes a similar bound known ...

متن کامل

A Near-optimal Regret Bounds for Thompson Sampling

Thompson Sampling (TS) is one of the oldest heuristics for multi-armed bandit problems. It is a randomized algorithm based on Bayesian ideas, and has recently generated significant interest after several studies demonstrated that it has favorable empirical performance compared to the state of the art methods. In this paper, a novel and almost tight martingale-based regret analysis for Thompson ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays

نویسندگان

چکیده

منابع مشابه

Analysis of Thompson Sampling for the Multi-armed Bandit Problem

Stochastic Regret Minimization via Thompson Sampling

Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis

Analysis of Thompson Sampling for Stochastic Sleeping Bandits

A Near-optimal Regret Bounds for Thompson Sampling

عنوان ژورنال:

اشتراک گذاری